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(54) Interactive audio entertainment apparatus 



(57) The speech of a group of virtual characters is 
modelled using a directed graph or audio web, where 
each utterance is a node (A1-A7, B1-B5) in the web 
having links from it to each possible utterance that can 
be made in reply at any point in the course of the inter- 
action. The choice of the next node (conversation seg- 
ment) is dependent on attributes of the character and/or 
the vi rtual environme nt in which the conversation is tak- 
ing place. These attributes may be fixed, modified by 
system function calls available at each node, or modi- 
fied in response to user input. 
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Description 

CThe present invention relates to entertainment systems and in particular to game and narrative systems wfiere one 
or more computer-modelled characters traverse a virtual environment, interacting with the environment and other char- 
acters within the environment, under direct or indirect control of a user. 

Character-based games or interactive drama have been modelled as a network of inter-connected pre-recorded 
scenes in a "story-net" arrangement of branching nodes. At each node in the network, a decision is required as to which 
of three of four possible scenes will be played out next with there being a common direction of travel through the scripted 
narrative such that the user will eventually reach the, or one of a number of possible, final scenes. Examples of different 
10 narrative structures, from the non-interactive "linear" form through the introduction of various effects and game-playing 
elements to full branching story structures, are described in "Flawed Methods for Interactive Storytelling" by Chris 
Crawford, Interactive Entertainment Design, Vol 7, No 4, April 1994. The problems of the vast amount of data required 
to provide even a few alternative paths through a story structure as opposed to the constraints of traditional (linear) nar- 
rative techniques are discussed and, whilst a solution is claimed, the details are withheld. These problems of interactiv- 
15 ity against storyline are further discussed in an article by the same author "Interactivity, Plot, Free Will, Determinism, 
Quantum Mechanics and Temporal Irreversibility" Interactive Entertainment Design, Vol 8, No 4, April 1995. 

Considering just the audio side of interactive entertainment, the character based games or interactive drama 
require dialogue between the characters: speech is the easiest way to convey meaning and feeling and thus generate 
a feeling of immersion for the user by relating to the characters. In traditional systems, such as those described by 
20 Crawford, this has been done using long pre-recorded sequences of speech providing the soundtrack to the selected 
scene following each branch point of the story. Even as a purely audio entertainment, such systems are still tied to the 
ongoing branch structure which, in addition to having long periods passing without variation, is still prohibitive in the 
amount of data required to be stored for even relatively few branches. 

It is therefore an object of the present invention to provide a character-based audio entertainment system with 
25 increased opportunity for variation whilst requiring less audio data to be stored than heretofore conventional. 

In accordance with the present invention, there is provided an entertainment apparatus comprising network data 
and character data stores coupled with a data processor; the network data store holding data defining a network of for- 
ward-linked nodes, with at least some nodes having forward links to two or more further nodes, and with the data proc- 
essor being configured for the sequential selection of linked nodes of the network; and the character data store 
30 containing at least one stored attribute value and selection from said links to at least two further nodes being deter- 
mined by said attribute value; characterised in that the apparatus further comprises an audio data store containing a 
plurality of data streams defining a respective audio signal segment for each node of the network, the audio data store 
being coupled with the network data store and processor and controlled to output an audio signal segment data stream 
on selection of the respective node. 
35 By use of the network of interlinked nodes, each audio segment (speech or sound effects) needs to be stored only 
once, with the respective node being selected each time that segment is required. The character data store suitably 
holds a plurality of stored attribute values with the selection from at least two further nodes being determined by the sat- 
isfaction of respective different combinations of attribute settings, at least one of which is suitably a default setting cho- 
sen in the event that none of the specified combinations is satisfied. This "audio web" may be used to generate 
40 conversations between two or more virtual characters, each with the own set of "personality" attributes in the character 
store, the setting of these attributes defining the path taken by the conversation at each node. 

The apparatus may further comprise a function call store holding a plurality of commands each specifying a respec- 
tive processor function, each function call being associated with at least one node of the network and being called and 
implemented by the processor on selection of the or each associated node. The processor may be controlled to modify 
45 at least one stored attribute value from the character data store in response to at least one function call. By use of these 
function calls, the characters' conversation path becomes more dependent on their reaction to the conversation and 
their fellow characters. One or more further nodes may be defined in the network, each with an associated function call 
but without a respective audio signal segment. These nodes without audio may be used as "stage directions" to alter 
features of the virtual environment or the characters. 
so User-operable input means may be provided coupled to the processor, with the user input affecting the selection at 
one or more of the nodes, suitably by the processor altering the setting of at least one of the stored attribute values. By 
initially setting or continually changing the attrfoute values, the user can vary the characters' "mood" and then hear how 
this affects the path of the conversation and the characters' inter-relationships. 

The apparatus may output the audio data to a separate unit, or may include audio signal reproduction apparatus 
55 coupled to the audio data store, receiving the output data streams and reproducing the audio signal segments. 

The invention also provides a removable storage device, such as a tape or an optical disc, acting as the audio data 
store and holding the audio signal segment data streams together with the data of the network data store which data 
includes indexing information (such as track and sector addresses) identifying the respective storage locations of each 
audio signal segment data stream. With the use of a removable storage device, the apparatus would include suitable 
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access means such as a disc reader. At the start of a session, the network data and accompanying indexing information 
are downloaded such that the selected audio segments may then be called with their track addresses such as to mini- 
mise access delays. 

Further features and advantages of the present invention will become apparent from reading of the following 
5 description of preferred embodiments of the invention, for the purposes of example, with reference to the accompanying 
drawings in which: 

Figure 1 illustrates a small section of a network of audio (speech) segments; and 
Figure 2 is a block schematic diagram of apparatus embodying the invention. 

10 

The following description is of audio web "story telling" apparatus where characters converse with each other whilst 
interacting in a virtual world. The path taken by the characters* conversation is not scripted but instead traverses a net- 
worked collection of nodes, each comprising a short audio segment of spoken word or sound effects. At each node a 
decision is taken as to which of a number of possible branch paths (leading to respective further nodes) will be taken. 

is The decision is taken on the basis of various conditions applied at each node and on the basis of attributes of the var- 
ious characters as they stand at that node, as will be described. 

Figure 1 illustrates a small section of the audio network with nodes A1-A7 containing speech segments of a first 
character and nodes B1-B5 containing speech segments of a second character. In the first node A1 , the first character 
says "Hello": the choice of the next node (B1 or A2) depends on what conditions are satisfied at node A1 For example, 

20 the deciding condition may be associated with an attribute of the second character which equates to whether or not it 
feels talkative. If the attribute is currently set to "talkative", the branch path to node B1 will be selected and the second 
character will reply "Hello", otherwise the first character has to continue talking, asking "How are you?" at node A2. 

The choice at node A2 depends on different conditions such that, although the second character is not feeling talk- 
ative, he will answer a direct question. The determining condition at node A2 may be the current setting of an attribute 

25 of the second character relating to his general well-being such that he will reply "Not too well today" (node B2) or Tine 
thanks, how are you?" (node B3) depending on the setting. The setting of the corresponding attribute of the first char- 
acter may decide its response to the question in node B3. 

An effective attribute for enhancing the realism of the character interaction is the like/dislike "opinion" of the other 
character. Whilst some attributes may retain fixed values throughout the characters' interaction, others such as like/dis- 

30 like are suitably dependent on the results of preceding interactions. These variable attributes may simply be "on" or "off" 
or may be more complex. With an initial attribute of 50 on a possible scale of 0-100, for example (i.e ambivalent to the 
second character), the branch selection at node A1 may entail an increase of 10 if the branch via node B1 is chosen 
and a reduction of 5 if the direct branch to node A2 is chosen with the attribute level change being the result of a function 
call at the destination node A2. The result is a like/dislike attribute setting of 60 or 45 depending on whether the second 

35 character "felt talkative" at the decision point of node A1 . The like/dislike attribute setting may then be the deciding factor 
at node B2: if the setting is 50 or above, the first character has an ambivalent or friendly attitude to the second and will 
reply "Oh dear" (node A4) in response to hearing that the second character is unwell. With the attribute setting of 45, 
however, the first character has "taken a dislike" to the second and will instead respond "hard luck" at node A3. 

The re-using of segments is shown by node A6 ("Never mind") which, in this example, follows both of nodes A3 and 

40 A4. In effect, in response to node B2 the first character may say "Hard luck, never mind" or "Oh dear, never mind". There 
is a trade-off to be made here with the saving in stored audio being offset by the additional call to the audio store nec- 
essary at node A3 or A4. Short phrases or even single words enable greatest re-usability but increase the complexity 
of the network and access times for the audio store may affect realism with short but noticeable pauses between suc- 
cessively called segments. Suitably, the audio segments may be split at punctuation points where a slight hesitation will 

45 sound more natural. 

r As can be seen from Figure 1 , the speaking of the same word/phrase by the different characters is handled with 
f separate nodes (e.g A1 and B2, A4 and B5) since the audio segments in this instance are pre-recorded: the same also 
I applies to different intonations of a given phrase by one character. Where the audio is generated from text by high qual- 
ity speech synthesis apparatus able to handle different intonations/accents, phrases need only be stored once. This, 
so however, leads to another trade off: whilst there is a reduction in the number of nodes, the branching structure becomes 
more complex and the number of conditions at each node increases with the need to determine which character is 
speaking and their intonation attribute settings before moving to assess the conditions determining branch selection to 
the next node. 

The characters' "behaviour" will be dependent on the way in which they are modelled. In a basic implementation. 
55 they may simply be defined in terms of an accent which responds in a set way defined by the node conditions based on 
current attribute settings. Preferably, however, the individual characters are modeled as a complex collection of inter- 
related behaviours which may develop apparent personality traits as the interaction with the story web proceeds. Rule- 
based programming languages are preferred to enable linking of, for example, character attributes as modified by the 
passage through the nodes with other, internally held attributes. With such linking, the network-directed alteration of 
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attribute levels may result in alteration of other attributes. For example, a character may have the network-directed 
attributes happy/sad and (other character) like/dislike which, in the basic implementation are independent. In the more 
complex character, an increase in the "happiness" attribute setting may result in an increase in the "like" attribute set- 
ting, as though the character was taking a kinder view of its fellows as its mood improved. In a development, the char- 

5 acter may have an internal attribute (for example, fairness), not directly controlled by the function calls (which may 
themselves be programmed rules) of the audio web, the setting of which determines the extent to which the setting of 
the happy/sad attribute affects the setting of the like/dislike attribute. Such internal attributes may be set by the user or 
may be pre-set. Alternatively they may be affected by the characters' interactions and their results: for example, a con- 
sistently high like/dislike attribute setting may generate "trust" or "suspicion" in the character with a consequent raising 

w or lowering of the fairness internal attribute setting. 

A suitable real-time rule-based language for specifying such inter-related groups of attributes and their influence on 
each other is Real Time ABLE (RTA), described in Proceedings of the European Simulation Multi conference 1991 at 
pages 226-231. Further examples of the features of rule-based languages are given in "The Behaviour Language; 
User s Guide" by R A Brooks, Al Memo 1227, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, 

15 April 1990. 

Figure 2 shows apparatus embodying the invention, comprising a processor 10 linked to a number of data stores 
1 2, 1 4, 1 6, 1 8,20,22 via data bus 24, together with an audio playback stage 26 and user input device 28. The first store, 
audio store 1 2, holds all of the speech and sound segments associated with the nodes of the network and outputs them, 
when called, to the audio playback stage 26 for direct play. The audio store may be built into the apparatus or may be 

20 provided on a removable storage device such as an optical disc with the processor 1 0 accessing the data via a suitable 
reader. Suitably the network description is also downloaded to the network store 14 (to be described) from the storage 
device together with indexing information such that each node is associated with the storage address of the associated 
audio segment. The processor 10 can then call for the audio segments from store by directly addressing them, thereby 
keeping access delays to a minimum. 

25 The next store, network store 14, holds the script web, a file defining the network interlinking the nodes. A fragment 
of the script web file may be as shown in the example below: 



File: 


Explanation: 


#2026, 14 


Node number 2026. Call function number 14 if this node is selected. 


>1150, 3,6,1 


Go to node 1 150 if conditions 3, 6 and 1 are satisfied. 


>2040, 3,2 


Go to node 2040 if conditions 3 and 2 are satisfied. 


>2028 


Go to node 2028 if other branch conditions are unsatisfied. 


#2028 


Node 2028. No attached functions. 


>2100, 3,2,4 


Go to node 2100 if conditions 3, 2 and 4 are satisfied. 


>1150, 3,2,1 


Go to node 1 150 if conditions 3, 2 and 1 are satisfied. 


>2040, 3,2 


Go to node 2040 if only conditions 3 and 2 are satisfied. 


>1810 


Default to node 1810. 



45 When a node is selected, in CPU 1 0, the node number addresses the appropriate segment of stored audio (via address 
line ADDR) in audio store 12. 

The next two stores 16,18 hold respectively the definitions of the condition tests to be applied and the current 
attribute settings for each character. The condition tests may be along the lines of "is attribute A greater than 50?" where 
attribute A is the like/dislike attribute described previously, the current value of which is held in character store 18. 

so The definition of the various functions which may be called at a node is held in function store 20. These functions 
may apply to the characters themselves, as various in one or more attribute settings as appropriate to the character 
learning or giving the information contained in the audio segment associated with the node at which the segment is 
called. The functions may also affect the virtual world within which the conversation takes place. The interaction 
between the characters may be improved if it moves between different "physical" environments, as defined in a model 

55 of the various virtual world scenarios held in world store 22, with the current environment in turn being the subject of 
one or more conditions. For example, a current environment of a prison cell would satisfy a condition leading to choice 
of the subsequent node audio segment "How do we get out of here?" rather than "I like it here". 

The different scenarios would themselves be inter-linked, with both character access to, and movement between 
the different scenarios being determined by the particular point in, and past passage through, the audio network. On a 
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direct level, the entry to a closed environment may require the character or characters to have passed through the node 
where a password was given. On an indirect level, movement into a different area of the network may trigger the move- 
ment to a new environment. The environment itself may be used to trigger audio effects via the CPU 10 and generated 
within the playback stage 26, for example e cho when the conversa tion is taking placp in a cave or dungeon scenario. 

s Those features which alter the attributes of the scenarios, including their selection from the virtual world, may be 

called by further nodes of the network without associated audio segments. These further nodes act as "stage direc- 
tions" such as "open the door": stage direction nodes may, of course, have associated audio segments, such as sound 
effects, and these are incorporated as with the conversation segments. 

The form of the user input device 28 will depend on the level of user interaction supported. At a basic level, the user 
* io input may be simply the initial selection of a cast of players or environments from a menu of those available. Increased 
levels of participation may involve initial or ongoing manipulation of character attributes (external or internal) with the 
user having a slider control on which he sets, for example, the like/dislike attribute of his "adopted" character in relation 
to his perception of another character of the narrative. 

As will be recognised, the above-described techniques and apparatus might be extended to incorporate video 

75 although a detailed discussion is beyond the scope of the present invention. With the web navigation being handled in 
dependence on the audio links, additional considerations for added video include the problems of audio/video synchro- 
nisation, particularly to short (e.g single word) audio segments, and the possibility of conversation segments occurring 
in more than one virtual environment. Computer-generated characters and environments may alleviate some of these 
problems: others may require less direct visual representations such as a still image of the current environment with 

20 overlaid computer-generated images of the characters faces which change each time an appropriate attribute setting 
changes. 

From reading the present disclosure, other modifications will be apparent to persons skilled in the art. Such modi- 
fications may involve other features which are already known in the field of interactive entertainment apparatus and 
component parts thereof and which may be used instead of or in addition to features already described herein. Although 

25 claims have been formulated in this application to particular combinations of features, it should be understood that the 
scope of the disclosure of the present application also includes any novel feature or any novel combination of features 
disclosed herein either explicitly or implicitly, whether or not it relates to the same invention as presently claimed in any 
claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The appli- 
cants hereby give notice that new claims may be formulated to such features and/or combinations of such features dur- 

30 ing the prosecution of the present application or of any further application derived therefrom. 

Claims 

1 . Entertainment apparatus comprising network data and character data stores coupled with a data processor; 

35 

the network data store holding data defining a network of forward-linked nodes, with at least some nodes hav- 
ing forward links to two or more further nodes, and with the data processor being configured for the sequential 
selection of linked nodes of the network; and 

the character data store containing at least one stored attribute value and selection from said links to at least 
-40 two further nodes being determined by said attribute value; 

characterised in that the apparatus further comprises an audio data store containing a plurality of data streams 
defining a respective audio signal segment for each node of the network, the audio data store being coupled 
with the network data store and processor and controlled to output an audio signal segment data stream on 
selection of the respective node. 

45 

2. Apparatus as claimed in Claim 1 , wherein said character data store contains a plurality of stored attribute values, 
the selection from said links to at least two further nodes being determined by the satisfaction of respective different 
combinations of attribute settings. 

so 3. Apparatus as claimed in Claim 2, wherein the selection of one of a plurality of links to respective further nodes is a 
default selection made when none of said respective different combinations of attribute settings is satisfied. 

4. Apparatus as claimed in any of Claims 1 to 3, further comprising a function call store holding a plurality of com- 
mands each specifying a respective processor function, each function call being associated with at least one node 

55 of the network and being called and implemented by the processor on selection of the or each associated node. 

5. Apparatus as claimed in Claim 4, wherein one or more further nodes are defined in the network, the or each such 
further node having an associated function call, but no respective audio signal segment. 
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6. Apparatus as claimed in Claim 4, wherein the processor is controlled to modify at least one stored attrfcute value 
from said character data store in response to at least one function call. 

7. Apparatus claimed in any one of Claims 1 to 6, further comprising user-operable input means coupled to said data 
5 processor, with user input affecting the said selection from links to at least two further nodes from one or more 

nodes of the network 

8. Apparatus as claimed in Claim 7, wherein the processor alters the setting of at least one stored attribute value in 
dependence on said user input. 

10 

9. Apparatus as claimed in any of Claims 1 to 8, further comprising audio signal reproduction apparatus coupled to 
said audio data store, receiving the data streams output therefrom and reproducing the respective audio signal 
segments defined. 

75 10. Apparatus as claimed in any of Claims 1 to 9, wherein at least said audio data store comprises a removable storage 
device, the apparatus further comprising means for accessing the audio signal segment data streams therefrom. 

1 1. A removable storage device as in Claim 10, holding said audio signal segment data streams together with the data 
of said network data store which data includes indexing information identifying the respective storage locations of 

20 each audio signal segment data stream. 

12. A storage device as claimed in Claim 1 1 , wherein said device is an optical disc and said indexing information iden- 
tifies track and sector addresses for the audio signal segment data streams. 

25 
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